Slider: Incremental Sliding-Window Computations for Large-Scale Data Analysis

نویسندگان

  • Pramod Bhatotia
  • Marcel Dischinger
  • Rodrigo Rodrigues
  • Umut A. Acar
چکیده

Sliding-window computations are widely used for data analysis in networked systems. Such computations can consume significant computational resources, particularly in live systems, where new data arrives continuously. This is because they typically require a complete re-computation over the full window of data every time the window slides. Therefore, sliding-window computations face important scalability problems. In this paper, we propose techniques for improving the scalability by performing sliding-window computations incrementally. In this paradigm, when some new data is added at the end of the window or old data dropped from its beginning, the output is updated efficiently by reusing previously run sub-computations, avoiding a complete recomputation. To realize this approach, we propose Slider, a novel framework that supports incremental sliding-window computations transparently and efficiently by leveraging selfadjusting computations principles of dynamic dependence graphs and change propagation. We implemented Slider based on the Hadoop MapReduce framework with a declarative SQL-like query interface, and evaluated it with a variety of applications and real world case studies from networked system. Our results show significant performance improvements for large-scale sliding-window computations without any modifications to the existing application code.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Asymptotic Analysis of Self-Adjusting Contraction Trees

In this report, we analyze the asymptotic efficiency of self-adjusting contraction trees proposed as part of the Slider project [2, 3]. Self-adjusting contraction trees are used for incremental computation [1, 4, 5, 8]. Our analysis extends the asymptotic efficiency analysis of Incoop [6, 7]. We consider two different runs: the initial run of an Slider computation, where we perform a computatio...

متن کامل

Window - based Data Processing with Stratosphere

Analyzing large amounts of ordered data is a common task in research and industry. The usual ordering domain is time: Examples for time-ordered data are sensor data, communication network data, or financial data. Besides online monitoring, it is common to investigate patterns or special events in the data after capturing it. These analysis can traditionally be performed within Data Stream Manag...

متن کامل

A Novel method of Data Stream Classification Based on Incremental Storage Tree

For the characteristics of large number, fast change, high cost of random access of data stream, this paper proposes a Bayesian classification data mining algorithm based on incremental storage tree to handle the problems. Use sliding window to process data stream and divide it into several basic units, apply Principal component analysis (PCA) to compress the data from window and produce dynami...

متن کامل

Incremental Computation Of Aggregate Operators Over Sliding Windows

Sliding Window is the most popular data model in processing data streams as it captures finite and relevant subset of an infinite stream. This paper studies different Mathematical operators used for querying and mining of data streams. The focus of our study is on operators, operating on the whole data set. These are termed as blocking operators. We have classified these operators according to ...

متن کامل

Large Scale Reinforcement Learning using Q-SARSA() and Cascading Neural Networks

This thesis explores how the novel model-free reinforcement learning algorithm Q-SARSA(λ) can be combined with the constructive neural network training algorithm Cascade 2, and how this combination can scale to the large problem of backgammon. In order for reinforcement learning to scale to larger problem sizes, it needs to be combined with a function approximator such as an artificial neural n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012